1,693 research outputs found
Deep Convolutional Neural Fields for Depth Estimation from a Single Image
We consider the problem of depth estimation from a single monocular image in
this work. It is a challenging task as no reliable depth cues are available,
e.g., stereo correspondences, motions, etc. Previous efforts have been focusing
on exploiting geometric priors or additional sources of information, with all
using hand-crafted features. Recently, there is mounting evidence that features
from deep convolutional neural networks (CNN) are setting new records for
various vision applications. On the other hand, considering the continuous
characteristic of the depth values, depth estimations can be naturally
formulated into a continuous conditional random field (CRF) learning problem.
Therefore, we in this paper present a deep convolutional neural field model for
estimating depths from a single image, aiming to jointly explore the capacity
of deep CNN and continuous CRF. Specifically, we propose a deep structured
learning scheme which learns the unary and pairwise potentials of continuous
CRF in a unified deep CNN framework.
The proposed method can be used for depth estimations of general scenes with
no geometric priors nor any extra information injected. In our case, the
integral of the partition function can be analytically calculated, thus we can
exactly solve the log-likelihood optimization. Moreover, solving the MAP
problem for predicting depths of a new image is highly efficient as closed-form
solutions exist. We experimentally demonstrate that the proposed method
outperforms state-of-the-art depth estimation methods on both indoor and
outdoor scene datasets.Comment: fixed some typos. in CVPR15 proceeding
Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss
Recent works on deep conditional random fields (CRF) have set new records on
many vision tasks involving structured predictions. Here we propose a
fully-connected deep continuous CRF model for both discrete and continuous
labelling problems. We exemplify the usefulness of the proposed model on
multi-class semantic labelling (discrete) and the robust depth estimation
(continuous) problems.
In our framework, we model both the unary and the pairwise potential
functions as deep convolutional neural networks (CNN), which are jointly
learned in an end-to-end fashion. The proposed method possesses the main
advantage of continuously-valued CRF, which is a closed-form solution for the
Maximum a posteriori (MAP) inference.
To better adapt to different tasks, instead of using the commonly employed
maximum likelihood CRF parameter learning protocol, we propose task-specific
loss functions for learning the CRF parameters.
It enables direct optimization of the quality of the MAP estimates during the
course of learning.
Specifically, we optimize the multi-class classification loss for the
semantic labelling task and the Turkey's biweight loss for the robust depth
estimation problem.
Experimental results on the semantic labelling and robust depth estimation
tasks demonstrate that the proposed method compare favorably against both
baseline and state-of-the-art methods.
In particular, we show that although the proposed deep CRF model is
continuously valued, with the equipment of task-specific loss, it achieves
impressive results even on discrete labelling tasks
CRF Learning with CNN Features for Image Segmentation
Conditional Random Rields (CRF) have been widely applied in image
segmentations. While most studies rely on hand-crafted features, we here
propose to exploit a pre-trained large convolutional neural network (CNN) to
generate deep features for CRF learning. The deep CNN is trained on the
ImageNet dataset and transferred to image segmentations here for constructing
potentials of superpixels. Then the CRF parameters are learnt using a
structured support vector machine (SSVM). To fully exploit context information
in inference, we construct spatially related co-occurrence pairwise potentials
and incorporate them into the energy function. This prefers labelling of object
pairs that frequently co-occur in a certain spatial layout and at the same time
avoids implausible labellings during the inference. Extensive experiments on
binary and multi-class segmentation benchmarks demonstrate the promise of the
proposed method. We thus provide new baselines for the segmentation performance
on the Weizmann horse, Graz-02, MSRC-21, Stanford Background and PASCAL VOC
2011 datasets
A Computational Model of the Short-Cut Rule for 2D Shape Decomposition
We propose a new 2D shape decomposition method based on the short-cut rule.
The short-cut rule originates from cognition research, and states that the
human visual system prefers to partition an object into parts using the
shortest possible cuts. We propose and implement a computational model for the
short-cut rule and apply it to the problem of shape decomposition. The model we
proposed generates a set of cut hypotheses passing through the points on the
silhouette which represent the negative minima of curvature. We then show that
most part-cut hypotheses can be eliminated by analysis of local properties of
each. Finally, the remaining hypotheses are evaluated in ascending length
order, which guarantees that of any pair of conflicting cuts only the shortest
will be accepted. We demonstrate that, compared with state-of-the-art shape
decomposition methods, the proposed approach achieves decomposition results
which better correspond to human intuition as revealed in psychological
experiments.Comment: 11 page
Cross-convolutional-layer Pooling for Image Recognition
Recent studies have shown that a Deep Convolutional Neural Network (DCNN)
pretrained on a large image dataset can be used as a universal image
descriptor, and that doing so leads to impressive performance for a variety of
image classification tasks. Most of these studies adopt activations from a
single DCNN layer, usually the fully-connected layer, as the image
representation. In this paper, we proposed a novel way to extract image
representations from two consecutive convolutional layers: one layer is
utilized for local feature extraction and the other serves as guidance to pool
the extracted features. By taking different viewpoints of convolutional layers,
we further develop two schemes to realize this idea. The first one directly
uses convolutional layers from a DCNN. The second one applies the pretrained
CNN on densely sampled image regions and treats the fully-connected activations
of each image region as convolutional feature activations. We then train
another convolutional layer on top of that as the pooling-guidance
convolutional layer. By applying our method to three popular visual
classification tasks, we find our first scheme tends to perform better on the
applications which need strong discrimination on subtle object patterns within
small regions while the latter excels in the cases that require discrimination
on category-level patterns. Overall, the proposed method achieves superior
performance over existing ways of extracting image representations from a DCNN.Comment: Fixed typos. Journal extension of arXiv:1411.7466. Accepted to IEEE
Transactions on Pattern Analysis and Machine Intelligenc
Structured Learning of Tree Potentials in CRF for Image Segmentation
We propose a new approach to image segmentation, which exploits the
advantages of both conditional random fields (CRFs) and decision trees. In the
literature, the potential functions of CRFs are mostly defined as a linear
combination of some pre-defined parametric models, and then methods like
structured support vector machines (SSVMs) are applied to learn those linear
coefficients. We instead formulate the unary and pairwise potentials as
nonparametric forests---ensembles of decision trees, and learn the ensemble
parameters and the trees in a unified optimization problem within the
large-margin framework. In this fashion, we easily achieve nonlinear learning
of potential functions on both unary and pairwise terms in CRFs. Moreover, we
learn class-wise decision trees for each object that appears in the image. Due
to the rich structure and flexibility of decision trees, our approach is
powerful in modelling complex data likelihoods and label relationships. The
resulting optimization problem is very challenging because it can have
exponentially many variables and constraints. We show that this challenging
optimization can be efficiently solved by combining a modified column
generation and cutting-planes techniques. Experimental results on both binary
(Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC
2012) segmentation datasets demonstrate the power of the learned nonlinear
nonparametric potentials.Comment: 10 pages. Appearing in IEEE Transactions on Neural Networks and
Learning System
- …